Changing distribution and abundance of small pelagics may drive changes in predator distributions, affecting predator availability to fisheries and surveys. However, small pelagic fish are difficult to survey directly, so we developed a novel method of assessing small pelagic fish aggregate abundance via predator diet data. We used piscivore diet data collected from multiple bottom trawl surveys within a Vector Autoregressive Spatio-Temporal (VAST) model to assess trends of small pelagics on the Northeast US shelf. The goal was to develop a spatial “forage index” to inform survey and/or fishery availability in the bluefish (Pomatomus saltatrix) stock assessment. Using spring and fall surveys from 1973-2020, 20 small pelagic groups were identified as major bluefish prey using the diet data. Then, predators were grouped by diet similarity to identify 19 piscivore species with the most similar diet to bluefish in the region. Diets from all 20 piscivores were combined for the 20 prey groups at each surveyed location, and the total weight of small pelagic prey per predator stomach at each location was input into a Poisson-link delta model to estimate expected prey mass per predator stomach. Best fit models included spatial and spatio-temporal random effects, with predator mean length, number of predator species, and sea surface temperature as catchability covariates. Spring and fall prey indices were split into inshore and offshore areas to reflect changing prey availability over time in areas available to the recreational fishery and the bottom trawl survey, and also to contribute to regional ecosystem reporting.
The objective of this work was to create a “prey index” to evaluate changes in bluefish prey over time and in space using Vector Autoregressive Spatio-Temporal modeling (VAST (Thorson and Barnett, 2017; Thorson, 2019)). This approach was patterned on Ng et al. (2021), which used predator stomach data to create a biomass index for a single prey, Atlantic herring. Expected biomass of herring per stomach was estimated in VAST with 2 linear predictors, the number of herring per stomach and the average weight of herring in a stomach.
We adapted the approach of Ng et al. (2021) to get an index for “bluefish prey” in aggregate rather than a single prey species. Further, we include inshore and offshore regions by combining results across regional bottom trawl surveys surveys as was done for summer flounder biomass in Perretti and Thorson (2019). Finally, since bluefish themselves are somewhat sparsely sampled by the surveys, we aggregate all predators that have a similar diet composition to bluefish to better represent bluefish prey biomass.
We characterize mean weight of bluefish prey from all piscivores caught at each survey location and model that over time/space. Covariates potentially affecting perceived patterns in the bluefish prey index include number of predators, size composition of predators, and sea surface temperature (SST) at each survey location.
Therefore, the steps involved to estimate the forage index included defining the input dataset, and running multiple configurations of the VAST model. Steps involved in defining the dataset included defining “bluefish prey”, defining a set of piscivore predators with similar diets to bluefish, integrating diet data from two regional surveys, and integrating supplementary SST data to fill gaps in in-situ temperature data measurements. Steps involved in running the VAST model included decisions on spatial footprint, model structure, model selection to determine if spatial and spatio-temporal random effects were supported by the data, and further model selection to determine which catchability covariates were best supported by the data. Finally, subsets of the spatial domain were defined to match bluefish assessment inputs (survey and recreational fishery CPUE) for potential use as covariates in bluefish stock assessment models, and a bias-corrected (Thorson and Kristensen, 2016) forage index for each spatial subset was generated.
This approach is generalizable to any spatial subset of the full VAST spatial domain, such that forage indices for each ecoregion on the Northeast US continental shelf, or for other piscivore predators can also be generated.
Fish food habits data are collected aboard several regional fishery independent surveys in the Northeast US. The longest time series of diet has been collected by the Northeast Fisheries Science Center (NEFSC) bottom trawl survey (Smith and Link, 2010) from south of Cape Hatteras, NC to the Scotian Shelf at the US/Canada border since the early 1970s, which represents the bulk of the data for this analysis. Using similar survey protocols to the NEFSC survey, the NorthEast Area Monitoring and Assessment Program (NEAMAP) survey has collected fish diet data in inshore waters from Cape Hatteras, NC to Cape Cod, MA since 2008. All analyses were completed in R (Team, 2021).
Bluefish eat small pelagics that are not well sampled by bottom trawl surveys. Bluefish themselves are not well sampled by bottom trawl surveys. Nevertheless, the diet samples collected for bluefish indicate that anchovies, herrings, squids, butterfish, scup, and small hakes are important prey. See working paper #XX, as well as this web summary page link for NEFSC survey and this presentation link for NEAMAP survey bluefish diet composition summaries.
Using all sampled bluefish stomachs included in the NEFSC food habits database 1973-2020, we created a list of all pelagic nekton bluefish prey that had at least 10 observations (Table 1). Broad categories such as empty stomach, fish unidentified, Osteichthyes, unidentified animal remains, and blown stomach were not included in the prey list.
Prey name | Bluefish stomachs (n) |
LOLIGO SP | 423 |
ENGRAULIDAE | 407 |
ANCHOA MITCHILLI | 321 |
PEPRILUS TRIACANTHUS | 307 |
CEPHALOPODA | 262 |
ANCHOA HEPSETUS | 171 |
ETRUMEUS TERES | 126 |
AMMODYTES SP | 96 |
STENOTOMUS CHRYSOPS | 69 |
MERLUCCIUS BILINEARIS | 53 |
ILLEX SP | 39 |
CLUPEA HARENGUS | 37 |
CLUPEIDAE | 30 |
POMATOMUS SALTATRIX | 22 |
ENGRAULIS EURYSTOLE | 18 |
LOLIGO PEALEII | 17 |
SCOMBER SCOMBRUS | 14 |
PLEURONECTIFORMES | 13 |
CYNOSCION REGALIS | 12 |
BREVOORTIA TYRANNUS | 10 |
The choice of predators is largely intended to balance increasing sample size for modeling bluefish prey with using predators likely to be foraging similarly to bluefish. One extreme assumption would be to include only bluefish as predators, but there are relatively few bluefish diet samples due to incomplete availability to bottom trawl surveys (see supplemental information). This would miss prey available to bluefish because we have not sampled bluefish adequately. The opposite extreme assumption would be to include all stomachs that contain any of the top bluefish prey, regardless of which species ate the prey. This would include predators that do not forage similarly to bluefish and might therefore “count” prey that are not actually available to bluefish due to habitat differences. The intermediate approach which selects a group of piscivores that forage similarly to bluefish both increases sample size and screens out the most dissimilar predators to bluefish. A further refinement to the input data is only using the prey items identified above as “bluefish prey” across all predators identified as piscivores.
For bluefish forage index modeling, we are selecting a set of predators that have high diet similarity to bluefish. Garrison and Link (2000) evaluated similarity of predator diets on the Northeast US shelf to develop foraging guilds. The Schoener similarity index (Schoener, 1970) was applied
to assess the dietary overlap, \(D_{ij}\), between predator pairs:
\[ D_{i,j} = 1 – 0.5 (∑ |p_{i,k} – p_{j,k}|) \]
where \(p_{i,k}\) = mean proportional volume of prey type \(k\) in predator \(i\) and \(p_{i,k}\) = mean proportional volume of prey type \(k\) in predator \(j\) (Garrison and Link, 2000).
Garrison and Link (2000) used NEFSC bottom trawl survey data 1973-1997. We are using diets from 1985-2020 to characterize the forage index. Therefore an additional 20+ years of diet information is available to assess whether predator diet similarity has changed. Diet similarity analysis has been completed (B. Smith, pers. comm.), with a table of prey similarity available via this link on the NEFSC food habits shiny app that was used to define feeding guilds based on 50 predators. We evaluated results from several clustering algorithms (see this link) to determine how robustly different sets of predators grouped with bluefish. The working group selected the piscivore list based on the “complete” clustering algorithm, including all species that clustered with all 3 sizes of bluefish (Table 2).
Predator name | Size category | Minimum length (cm) | Maximum length (cm) |
ATLANTIC COD | XL | 81 | 150 |
ATLANTIC HALIBUT | M | 31 | 60 |
ATLANTIC HALIBUT | L | 61 | 90 |
BLUEFISH | S | 3 | 30 |
BLUEFISH | M | 31 | 70 |
BLUEFISH | L | 71 | 118 |
BUCKLER DORY | M | 21 | 50 |
CUSK | L | 51 | 104 |
FOURSPOT FLOUNDER | L | 41 | 49 |
GOOSEFISH | S | 5 | 30 |
GOOSEFISH | M | 31 | 60 |
GOOSEFISH | L | 61 | 90 |
GOOSEFISH | XL | 91 | 124 |
LONGFIN SQUID | S | 1 | 15 |
LONGFIN SQUID | M | 16 | 30 |
NORTHERN SHORTFIN SQUID | S | 3 | 15 |
NORTHERN SHORTFIN SQUID | M | 16 | 30 |
POLLOCK | L | 51 | 80 |
POLLOCK | XL | 81 | 120 |
RED HAKE | L | 41 | 98 |
SEA RAVEN | S | 4 | 25 |
SEA RAVEN | M | 26 | 50 |
SEA RAVEN | L | 51 | 68 |
SILVER HAKE | M | 21 | 40 |
SILVER HAKE | L | 41 | 76 |
SPINY DOGFISH | M | 36 | 79 |
SPINY DOGFISH | L | 80 | 117 |
SPOTTED HAKE | M | 21 | 40 |
STRIPED BASS | M | 31 | 70 |
STRIPED BASS | L | 71 | 120 |
SUMMER FLOUNDER | M | 21 | 40 |
SUMMER FLOUNDER | L | 41 | 70 |
THORNY SKATE | XL | 81 | 108 |
WEAKFISH | M | 26 | 50 |
WHITE HAKE | M | 21 | 40 |
WHITE HAKE | L | 41 | 136 |
This piscivore dataset better captured predators that feed similarly to bluefish (e.g. striped bass), and has a higher proportion of stations with bluefish prey than a dataset based on the Garrison and Link (2000) piscivore definition. We also evaluated a piscivore definition using only the predators that always cluster with bluefish no matter what clustering algorithm is applied. However, a dataset based on that limited piscivore list excluded predators highlighted by bluefish experts (e.g., striped bass) and resulted in the inclusion of fewer overall stations than the either of the above piscivore definitions, with a lower proportion of included stations with bluefish prey.
The NEAMAP survey operates closer to shore than the current NEFSC survey. While both surveys capture many of the same predators, some are not available close to shore and others are more available close to shore. The NEAMAP dataset includes the following piscivore predators and size ranges, adding two species not captured by the NEFSC survey offshore but included based on working group expert judgement of prey similarity to bluefish (Spanish mackerel and spotted sea trout) and leaving out those not captured inshore:
For each survey dataset, the full diet database was filtered to include only predators on the list of piscivores with the most diet similarity to bluefish (Table 2). Then, the list of bluefish prey (Table 1) was used to categorize prey items for each predator as “bluefish prey” or “other prey”. Each station was given a unique station identifier (cruise and station number), and the total weight (g) of bluefish prey at each station was summed. Total bluefish prey weight was divided by the total number of stomachs across all piscivore predators at the station to get mean bluefish prey weight (g) at each station. In addition, the number of piscivore species and the mean size (total length, cm) across all piscivores was calculated at each station. Seasons were identified as “Spring” (collection months January - June) and “Fall” (collection months July-December) to align with assumptions used in the bluefish stock assessment. Vessel identifiers were assigned based on years and survey, with two vessels used for the NEFSC survey (R/V Albatross prior to 2009 and R/V Bigelow 2009 to present) and a single vessel used for the NEAMAP survey 2008-present. These were used to evaluate whether vessel effects were present. Variable names were reconciled between NEFSC and NEAMAP, and the datasets were appended into a single dataset with one row per station including station ID, year, season, date, latitude, longitude, vessel, mean bluefish prey weight, mean piscivore length, number of piscivore species, and sea surface temperature (degrees C).
Initial dataset checks revealed that approximately 10% of survey stations were missing in-situ sea water temperature measurements. Gaps in temperature information were more prevalent early in the time series (1980s and early 1990s), although stations without temperature data were found in nearly all years (see “SST mismatch by year” section at this link). Rather than truncate the dataset to only those stations with in-situ temperature measurements, we investigated other sources of SST data to fill gaps.
Two SST data sources were investigated, both based on satellite data: the National Oceanic and Atmospheric Administration Optimum Interpolation Sea Surface Temperature (NOAA OI SST) V2 High Resolution Dataset (Reynolds et al., 2007) data provided by the NOAA PSL, Boulder, Colorado, USA, from their website at https://psl.noaa.gov, and the higher resolution source data for the OI SST, the AVHRR Pathfinder SST data linked here (Saha et al., 2018). Both sources provide global daily SST at different spatial resolutions (OI SST uses a 25 km grid, and AVHRR uses a 4 km grid) from 1981-present.
The OI SST data are provided as global files for each year. Files for years 1985-2021 were downloaded from https://downloads.psl.noaa.gov/Datasets/noaa.oisst.v2.highres/sst.day.mean.[year].v2.nc as rasters using code developed by Kim Bastille for Northeast US ecosystem reporting, cropped to the Northeast US spatial extent, and converted to R dataframe objects where the temperature of a grid cell is associated with the coordinates at the center of the grid cell. Then, OI SST temperature was matched to the survey data using year, month, day and spatial nearest neighbor matches to survey station locations.
AVHRR data is organized into year/data folders with 2 nc files for each date, one day and one night. A list of survey year-month-day combinations was generated from the piscivore diet dataset to download only AVHRR daily files within \(\pm\) 2 days of survey dates from https://www.ncei.noaa.gov/data/oceans/pathfinder/Version5.3/L3C/[year]/data/. The “sea_surface_temperature” and “quality_level” fields were extracted as rasters, cropped to the Northeast US spatial extent, appended into an annual dataset and saved as R dataframe objects similar to the OI SST methods. Examination of a subset of AVHRR SST data with quality level 3 and above showed extended periods with little SST data within the survey domain, likely due to cloud cover (see “SST AVHRR 2021 survey dates” section at this link). Therefore, to match AVHRR SST to survey stations, methods for combining and filling temporal and spatial gaps would be required, which was beyond the scope of the current analysis.
For survey stations with in-situ temperature measurements, the in-situ measurement was retained. For survey stations with missing temperature data (~10% of all stations), OI SST was substituted for input into VAST models.
We will use VAST (Thorson and Barnett, 2017; Thorson, 2019) to evaluate changes in bluefish prey biomass and distribution over time. VAST is structured to estimate fixed and random effects across two linear predictors, which are then multiplied to estimate an index of the quantity of interest. A full model including fixed intercepts (\(\beta\)), spatial random effects (\(\omega\)), spatio-temporal random effects (\(\varepsilon\)), vessel effects (\(\eta\)), density or habitat covariates, and catchability covariates.
\[ \rho_1(i) = \beta_1(c_i, t_i) + \omega_1^*(s_i, c_i) + \varepsilon_1^*(s_i, c_i, t_i) + \eta_1(v_i, c_i) + \nu_1(c_i, t_i) + \zeta_1(i) - \iota(c_i, t_i) \]
Thorson (2019) lists 15 major decisions for constructing a VAST model.
So is it the whole Northwest Atlantic grid and should I then rerun the REML model selection one more time? to get the same answer…
Univariate model to produce univariate forage index to be most useful in assessment model.
Following what Ng et al. (2021) did for herring, we apply a Poisson-link delta model to estimate expected prey mass per predator stomach. However, we use 500 knots, estimated by k-means clustering of the data, to define the spatial dimensions of each seasonal model. We compared the AIC for the 500 knot models to see if including the spatial and spatio-temporal random effects in the first and second linear predictors improved the model fit. Model structures tested include with and without anistropy (2 fixed parameters), and with and without spatial and spatio-temporal random effects in the second linear predictor or both linear predictors. This follows the model selection process outlined in Ng et al. (2021) using REML.
Following this, we evaluated catchability covariates using AIC to determine which are best supported by the data. We used the structure selected by the REML model selection, then evaluated vessel effects (overdispersion) and a range of potential catchability covariates using maximum likelihood to calculate AIC instead of REML, because we are not changing the random effects. Catchability covariates explored included mean predator length at each station, number of predator species at each station, and sea surface temperature (SST) at each station.
The predator length covariate may more directly model vessel differences in predator catch that affect stomach contents than modeling a vessel catchability covariate directly. This was the approach taken by Ng et al. (2021). They found that predator length covariates were strongly supported as catchability covariates (larger predators being more likely to have more prey in stomachs). The rationale for including number of predator species is that more species “sampling” the prey field at a particular station may result in a higher encounter rate (more stations with positive bluefish prey in stomachs). Water temperature was also included as a potential catchability covariate, because temperature affects predator feeding rate.
Our main goal is to determine whether bluefish prey availability has changed in inshore waters where the recreational fishery primarily operates. Our food habits datasets do not extend into inland waters such as bays and sounds, with the exception of Cape Cod Bay. (In the future we might be able to investigate food habits from ChesMMAP or surveys south of Cape Hatteras.) However, there is data from both historical NEFSC surveys and NEAMAP in state coastal waters (0-3 miles from shore), and offshore across the continental shelf.
The model has been partitioned into several definitions of “inshore” and “offshore” for the stock assessment inputs. First we define a partition that is the MAB and GB areas only as the GOM is not relevant to the bluefish assessment (yet). This is called MABGB, while the full model is AllEPU. Within this partition,
Survey strata definitions are built into the VAST
northwest-atlantic extrapolation grid already. We defined
additional new strata to address the recreational inshore-offshore 3
mile boundary.
The list of bluefish prey derived from the most common identifiable prey items in NEFSC diet database (Table 1) includes the majority of bluefish diet composition by decade (Fig. ) and season (Fig. ). Colors in the plots show included prey, while gray sections represent “fish unidentified” and other categories not included in this analysis. Even when evaluated annually and seasonally (where bluefish sample sizes may be small for a year-season combination), a majority of observed diet is included in the dataset for analysis (Fig. ).
The full NEFSC diet database 1985-2021 contains 24984 survey stations with diet collections. When including only piscivores feeding similarly to bluefish, the survey stations with diet collections in this time period is 22186. Of these piscivore diet stations, 8825 included our defined bluefish prey, or 39.7773371%. For comparison, 1806 stations have diet samples for bluefish alone, with 903 or 50% including our defined bluefish prey.
NEAMAP survey stations with diet collections for piscivores (n = 3838) had a higher proportion with our defined bluefish prey (n = 2418, 63.0015633%).
[add stuff about how many SST values were missing and which years got substituted]
[in progress]
Using REML, models including spatial and spatio-temporal random effects as well as anistrophy were best supported by the data.
Catchability covariates were better supported by the data than vessel effects. Model comparisons led us to the best model fit using mean predator length, number of predator species, and SST at a survey station as catchability covariates.
The model has been partitioned into several definitions of “inshore” and “offshore” for the stock assessment inputs. First we define a partition that is the MAB and GB areas only as the GOM is not relevant to the bluefish assessment (yet). This is called MABGB, while the full model is AllEPU. Within this partition,
Survey strata definitions are built into VAST already. The results presented here pertain only to survey strata; I’ll consider the 3 mile split separately in a different model run. I can’t find a way to have VAST do all the partitions at once, because the 3 mile split cuts across survey stratum boundaries.
Definitions for strata used in the script
bluefishdiet/VASTunivariate_bfp_allsurvs_lencovSST_inoffsplit.R:
Bias correction is based on (Thorson and Kristensen, 2016).
Run this script, the same used to test catchability covariates, but
now has all three covariates (predator mean length, number of predator
species, and SSTfill) included and bias.correct=TRUE:
Make a lookup table:
Plot individual time series (bias corrected) with standard errors:
or as proportions (here proportion of MABGB index).
Fall density maps with covariates
Plot individual time series
or just the indices from inshore (alb), inshore bluefish, offshore bluefish, and further out
or as proportions (here proportion of MABGB index).
Spring density maps with covariates
All results in the respective
pyindex/allagg_fall_500_lennosst_split_biascorrect and
allagg_spring_500_lennsst_splitbiascorrect folders.
The full results are on google drive rather than github to save space.
Still to do: